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Abstract 


Our  decisions  reflect  uncertainty  in  various  ways.  We  take  account  of 
the  uncertainty  embodied  in  the  roll  of  the  die:  we  less  often  take  account 
of  the  uncertainty  of  our  belief  that  the  die  is  fair.  We  need  to  take 
account  of  both  uncertain  knowledge  and  our  knowledge  of  uncertainty. 
"Evidence"  itself  has  been  regarded  as  uncertain.  We  argue  that  point¬ 
valued  probabilities  are  a  poor  representation  of  uncertainty;  that  we  need 
not  be  concerned  with  uncertain  evidence;  that  interval-valued  probabilities 
that  result  from  knowledge  of  convex  sets  of  distribution  functions  in 
reference  classes  (properlyl  include  Shafer's  mass  functions  as  a  special 
case;  that  these  probabilities  yield  a  plausible  non-monotonic  form  of 
inference  (uncertain  inference,  inductive  inference,  statistical  inference); 
and  finally  that  this  framework  provides  a  very  nearly  classical  decision 
theory  —  so  far  as  it  goes.  It  is  unclear  how  global  the  principles  (such 
as  minimax)  that  go  beyond  the  principle  of  maximizing  expected  utility  are. 
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REPRESENTING  KNOWLEDGE  AND  EVIDENCE  FOR  DECISION* 

One  purpose  —  quite  a  few  thinkers  would  say  the  main  purpose  —  of 
seeking  knowledge  about  the  world  is  to  enhance  our  ability  to  make  sound 
decisions.  An  item  of  knowledge  that  can  make  no  conceivable  difference 
with  regard  to  anything  we  might  ^  would  strike  many  as  frivolous.  Whether 
or  not  we  want  to  be  philosophical  pragmatists  in  this  strong  sense  with 
regard  to  everything  we  might  want  to  enquire  about,  it  seems  a  perfectly 
appropriate  attitude  to  adopt  toward  artificial  knowledge  systems. 

If  it  is  granted  that  we  are  ultimately  concerned  with  decisions,  then 
some  constraints  are  imposed  on  our  measures  of  uncertainty  at  the  level  of 
decision  making.  If  our  measure  of  uncertainty  is  real  valued,  then  it 
isn't  hard  to  show  that  it  must  satisfy  the  classical  probability  axioms. 

For  example,  if  an  act  has  a  real-valued  utility  U(E)  if  event  E  obtains, 
and  the  same  teal-valued  utility  if  the  denial  of  E  obtains  (U(E)  =  U(-E)) 
then  the  expected  utility  of  that  act  must  be  U(E),  and  that  must  be  the 
same  as  £*C(E)  £*U(-E),  where  £  and  £  represent  the  uncertainty  of  E  and 

-E  respectively.  But  then  we  must  have  £  ■♦■  £  =  1.^ 

There  are  reasons  for  rejecting  real-valued  —  i.e.,  strictly 
probabilistic  —  measures  of  uncertainty,  though  not  all  the  reasons  that 
have  been  adduced  for  doing  so  are  cogent.  One  is  that  these  probabilities 
seem  to  embody  more  knowledge  than  they  should;  for  example,  if  your  beliefs 
are  probabilistic,  and  you  assign  a  probability  of  .1  to  a  drawn  ball's 
being  purple  (on  no  evidence),  and  a  probability  of  .2  to  a  second  ball's 
being  purple  on  Che  evidence  that  the  first  one  is,  and  regard  pairs  of 
balls  as  "exchangeable"'^,  then  you  should  be  9931  sure  a  priori  that  in  the 
infinitely  long  run,  no  more  than  HZ  of  the  balls  will  be  purple.  You  know 


beyond  <  thedow  of  t  doubt  (with  probebility  .99996)  on  no  evidence  «t  ell 
that  no  more  than  half  will  be  purple.  (Kyburg,  1968) 

Peter  Cheeseman  (1985)  has  given  a  defense  of  classical  probability, 
and  perhaps  would  not  find  even  such  results  as  the  foregoing  distasteful. 
But  it  IS  hard  to  see  how  to  defend  the  real-valued  point  of  view  from 
charges  of  subjectivity.  Cheeseman  refers  to  an  "ideal"  observer,  but 
offers  us  no  guidance  in  how  to  approach  ideality,  nor  any  characterization 
of  how  the  ideal  observer  differs  from  the  rest  of  us.  It  is  therefore 
quite  unclear  what  the  ideal  observer  offers  us,  other  than  moral  support: 
each  of  us  is  no  doubt  convinced  chat  the  ideal  observer  assigns 
probabilities  just  like  himself.  One  man's  subjectivity  is  another  man's 
rational  insight.  And  there  is  clearly  no  guidance  here  for  the 
construction  of  programs  that  represent  probabilities. 

There  are  other  ways  of  representing  uncertainty  than  by  real  numbers 
between  0  and  1,  If  these  uncertainties  are  to  be  used  in  making  decisions, 
however,  they  must  be  compatible  with  classical  point-valued  probabilities. 
My  preference  is  for  intervals,  because  they  can  be  based  on  objective 
knowledge  of  distributions,  and  because  this  compatibility  is  demonstrable. 
(Kyburg,  1983) 

In  what  follows,  I  will  sketch  the  properties  of  interval-valued 

epistemic  probability,  and  exhibit  a  structure  for  knowledge  representation 

chat  allows  for  both  uncertain  inference  from  evidence  and  uncertain 

knowledge  as  a  basis  for  decision.  We  need  both  uncertain  knowledge  and 

knowledge  of  uncertainty.  Along  the  way  I  make  some  comparisons  to  other 

a  .  <1 . 

r'  I  r i ■ 

A  •  (  « 

I 

•  .  ■  t 


approaches . 
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1 .  Probability. 

Probability  is  a  function  from  atatements  and  acts  of  atateaenta  to 
closed  aubintervals  of  [O.l],  The  aeta  of  statements  represent  hypothetical 
bodies  of  knowledge.  The  idea  behind  Prob(S  ,K)  ■  [£)£]  !•*  that  someone 

whose  body  of  knowledge  is  K  should,  ought  to,  have  a  'degree'  of  belief  in 
^  characterized  by  the  interval  The  cash  value  of  having  such  a 

'degree'  of  belief  is  that  he  should  not  sell  a  ticket  that  returns  to  the 
purchaser  $1.00  for  less  than  lOOp  cents,  and  he  should  not  buy  such  a 
ticket  for  more  than  100£  cents.  The  relation  in  question  is  construed  as  a 
purely  objective,  logical  relation. 

Every  probability  can  be  based  on  knowledge  of  statistical 
distributions  or  relative  frequencies,  since  statements  known  to  have  the 
same  truth  value  receive  the  same  probability,  and  every  such  equivalence 
class  of  statements  (we  can  show)  contains  some  statements  of  the 
appropriate  form.  This  statistical  knowledge  may  be  both  uncertain  and 
approximate  (we  may  be  practically  sure  betweteen  30Z  and  402  of  the  balls 
are  black),  but  it  is  objective  in  the  sense  that  any  two  people  having  the 
same  evidence  should  have  the  same  knowledge. 

Classical  point-valued  probabilities  constitute  a  special  case; 
corresponding  to  the  extreme  hypothetical  (and  unrealistic)  case  in  which  X 
embodies  exact  statistical  knowledge. 

The  connection  between  statements  and  frequencies  is  given  by  a  set  of 
formal  procedures  for  finding  the  right  reference  class  for  a  given 
statement.  The  reference  set  may  be  multi-dimensional  —  the  set  of  urns, 
each  paired  with  the  set  of  draws  made  from  it.  It  may  be  only 
"accidentally"  related  to  sentence  —  as  when  ve  predict  the  act  of  someone 
who  makes  a  choice  on  the  basis  of  a  coin  toss.  What  is  the  right  reference 
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clan  for  a  given  atateBenC  £  dependa  (foroally  and  objecCivelj)  on  vhac  ia 
in  K,  our  body  of  knowledge.  In  aome  caaes  we  can  implement  a  procedure  for 
findic_  the  right  reference  claaa.  (Loui,  forthcoming. 2 ) 

It  VB  natural  to  suppose  that  statistical  knowledge  in  K  ia  represented 
by  the  attribution  to  each  reference  set  of  a  convex  set  of  distributions  — 
tor  example  we  have  every  reason  in  the  world  to  suppose  that  beads  among 
coin-tosses  in  general  is  nearly  binomial,  with  a  parameter  close  to  a  half. 
(We  have  no  reason  to  suppose  that  the  parameter  has  the  real  value 
0.49999...).  Or  we  may  have  good  reason  to  believe  that  two  quantities  are 
uncorrelated  in  their  joint  distribution.  Or  that  we  can  rule  out  certain 
classes  of  extreme  distributions.  We  can  know  of  a  certain  bent  coin  Chat 
heads  will  be  binomially  distributed  in  sequences  of  its  tosses,  with  a 
parameter  £  at  least  equal  to  a  half. 

Henceforth,  we  assume  convexity.  Here  are  some  immediate  results;^ 

(1)  if  Prob  (S,K)  ■  f£»s3  f^en  Prob(-S,K)  =  [^l-^,l-2ll. 

(2)  if  -  (S  &  T)  is  in  K,  and  P(S)  =  [£l,£l1  and  P(T)  *  [£2, £2]  and 

and  P(T  y  S)  =  then  there  are  numbers  in  [plj^l]  and  [£2, £21  whose 

sum  is  in  [e.sV  To  see  that  t£>£  i  can  be  a  proper  subset  of 

|£l  +  £2f£l  £21,  consider  a  die  that  you  know  to  be  biassed  toward  the  one 

at  the  expense  of  the  two,  or  toward  the  two  at  the  expense  of  the  one. 
Reasonable  probability  for  the  disjunction,  "one  or  two"  would  be  very  close 
to  1/3,  even  though  the  reasonable  probabilities  for  the  one  and  the  two 
would  be  significantly  spread  above  and  below  1/6. 

(3)  We  can  show  that;  given  any  finite  set  ot  sentences,  and  a  body  of 

knowledge  K,  there  exists  a  Bayesain  function  B,  satisfying  the  classical 
probability  axioms,  such  that  for  every  sentence  S  in  €  Prob(S ,K) . 

(4)  Let  KE  be  the  body  of  knowledge  obtained  from  K  when  evidence  E  is 
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added  to  IL  If  C  ia  among  Che  fioice  aeC  of  aenceocea  in  quaacion,  Chen 

there  may  be  no  Bayeaian  function  B  aatiafying  both  B(S)  €  Prob(S,K)  and 

B(S/E)  fc  Prob(S,lCE):  claaaical  conditionalitation  ia  not  the  only  way  of 

6 

updating  probab i  1  i t iea . 

(5)  There  are  non-trivial  cases  in  which  algorithms  for  computing 
probab i 1 i t ier  —  i.e.,  for  picking  the  right  reference  class  —  have  been 
provided.  (Loui,  f orthcoming  .  2  ) 

2 .  Updating. 

A  problem  that  has  attracted  a  lot  of  attention  is  the  problem  of 
updating  probabilities  in  the  light  of  new  evidence.  A  related  problem  is 
that  of  dealing  with  "uncertain"  evidence.^  The  problem  of  uncertain 
evidence  can  be  avoided  by  mechanical  procedures  in  two  well  known 
formalisms.  From  a  strictly  Bayesian  point  of  view,  updating  should  take 
place  by  Jeffrey's  rule:  P'(H)  -=  P(H/E)*P'(E)  P(H/-E)*P'(-E)  (Jeffrey, 

1965).  The  rule  is  not  uncontroversial  (Levi,  1967),  but  in  those  cases 
where  it  seems  plausible,  we  can  achieve  the  same  result  by  conditioning  on 
a  piece  of  "certain"  evidence  that  we  expand  our  algebra  to  accommodate. 
Similarly,  it  has  been  shown  that  the  same  trick  will  work  with  Glenn 
Shafer's  well  known  mathematical  theory  of  evidence  (Shafer,  1976):  we  can 
mechanically  replace  general  combination  of  support  functions,  so  long  as 
the  evidence  can  be  represented  by  a  separable  support  function,  by  Dempster 
conditioning  —  Shafer's  analog  to  Bayesian  condi t iona 1 iza t ion.  (Kyburg, 
forthcoming. 1 ) 

The  relation  between  Shafer's  theory  and  the  system  of  probability  just 
outlined  is  interesting.  Let  9  be  a  possibility  space,  with  support 
function  £  defined  on  it.  Shafer  also  defines  a  plausibility  function  £: 
for  every  subset  S  of  0,  £(S)  =  1  -  £(0  -  S).  Of  course  subsets  of  a 
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potlibilicjr  space  correspond  exactly  Co  propositions,  and  we  can  construct  a 
convex  set  of  probability  functions  over  these  propoiitions  auch  chat  the 
■iniauB  and  aaxiaun  probabilities  assigned  to  a  proposition  are  exactly  the 
support  and  plausibility  of  the  corresponding  subset  of  0.  (Ryburg, 
forthcoming . 1 ) 

But  the  converse  doesn't  hold.  Consider  a  compound  experiment 
consisting  of  either  (1)  tossing  a  fair  coin  twice,  or  (2)  drawing  a  coin 
from  a  bag  containing  40t  two-headed  and  60X  two-tailed  coins  and  tossing  it 
twice.  The  two  alternatives  are  performed  in  some  unknown  ratio  £.  Let  A 
be  the  event  that  the  first  toss  lands  heads,  and  B  the  event  that  the 
second  toss  lands  tails.  The  representation  by  a  convex  set  of  probability 
functions  is  straight-forward: 

P(TT)  -  £/4  +  0,6(1-£) 

P(TH)  -  £/4 

Vim)  -  £/4 

p(rr)  -  £/4  ♦  o.4(i-£) 

The  convex  set  of  probability  measures  over  the  sample  space  is  just  the 
set  of  these  values  for  £  t  ^0,1^.  Let  this  set  be  £P.  P*(S)  *  min  ^P(S):Pfr 
Spj  is  not  a  support  function,  by  theorem  2.1  of  (Shafer,  1976).  (Kyburg, 
forthcoming. 1 ) 

Finally,  let  ^(e)  be  the  set  of  probability  functions  resulting  from 
condit  ional  izing  the  members  of  P  on  e.  That  is,  if  £  belongs  to  P,  then  the 
function  £(x/e)  ~  defined  for  every  sentence  x  in  the  original 

Q 

algebra  will  belong  to  *  convex  set  of  classical 

probability  functions.  Let  CP le  be  the  corresponding  lower-probability 
function,  and  CPue  the  corresponding  upper-probability  functin.  (Neither 
are  probability  functions  —  hence  the  hyphens  are  not  accidental.)  Let 
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DPie  be  the  tupport  function  obteiocd  froa  the  lupport  function  • 
correiponding  to  P  by  Dcapeter  conditioning  —  i.e.,  Deapeter'i  rule  of 
combination  applied  to  Che  caae  where  je  receives  unit  support.  Let 
DPpe  be  the  corresponding  plausiblity  function.  Then 

CPIeCs )  £  DPse(  8 )  <  DPpe( s )  ^  CPue(s) 

Inequality  holds  unless  certain  measures  on  subsets  have  the  value  0.  When 
it  comes  to  updating  probabilities  relative  to  evidence,  Shafer's  procedure 
exaggerates  the  impact  of  evidence  beyond  its  Bayesian  import.  (Kyburg, 
forthcoming.  1 ) 

But  we  can  also  specify  exactly  the  conditions  under  which  this  form  of 
updating  agrees  with  convex  Bayesian  conditionalization.  If  these 
conditions  are  satisfied,  then  it  makes  sense  to  follw  the  Demps ter-Shaf er 
formalism,  especially  when  it  is  computationally  simpler. 

Bayesian  conditionalization  is  not  always  the  right  way  of  updating 
probabilities,  however.  A  situation  in  which  Bayesian  conditionalization 
whould  be  given  up  appears  in  (Kyburg,  for thcoming . 2 ) 

3 .  Uncertain  Knowledge 

One  problem  chat  Bayesian  and  ocher  approaches  to  uncertainty  have  is 
that  there  is  no  formal  way  of  representing  the  acquisition  of  knowledge. 

We  can  represent  the  having  of  knowledge  (by  the  cssignment  of  probability  1 
to  the  item),  but  since  there  is  no  way  in  which  _P(S/E)  can  be  1  unless  ns) 
is  already  one,  conditionalization  doesn't  get  us  knowledge.  This  has  been 
noticed,  of  course;  Cheeseman  (1985,  p.  1008)  simply  says,  "A  reasonable 
compromise  is  to  treat  propositions  whose  probability  is  close  to  0  or  1  as 
if  they  are  known  with  certainty...."  But  of  course  it  is  well  known  that 
this  cannot  be  done  generally:  the  conjunction  of  a  number  of  certainties  is 
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«  certainty,  but  the  conjunction  of  a  lar|e  enough  nuabar  of  "reaaonable 
certaintiee"  in  Cheeteman'a  aense  ia  what  he  would  hawe  to  conaidcr  an 

9 

impoasibility. ' 

McCarthy  and  Hayes  (1969)  are  aeduced  into  following  thia  priaroae 
path,  when  they  suggest  (p.  489)  "If  6l,  d2,...,9n  &  ia  a  poaaible 

deduction,  then  pr obab  1  y(  &  I probabl y(^n)  ^  probably(ft  )  is  alao  a 
possible  deduction,"  This  is  clearly  ruled  out,  on  our  scheme  —  and  even 
acceptable(^l),...,  accept able(Q2)  H  acceptable(^)  is  ruled  out  as  a 
consequent  of  the  logical  conditional.  Many  phi loeophera ,  of  course,  have 
taken  this  for  granted  —  but  if  we  are  to  formalire  uncertain  inference  at 
all,  we  must  somehow  accommodate  sets  of  conflicting  statements.  Purely 
probabilistic  rules  of  inference  do  this  easily. 

We  can  accommodate  Checaeman's  intuition  that  we  should  accept  what  is 
practically  certain  by  considering  two  sels  of  sentences  in  the 
representation  of  knowledge.  One  of  them  we  will  call  the  evidential 
corpus,  and  denote  by  the  other  we  will  call  the  corpus  of  practical 
certainties,  and  denote  by  Kp . 

We  will  accept  sentences  into  ^  if  and  only  if  their  probability 
relative  to  ^  is  greater  chan  p.  The  conjunction  of  two  statements  that 
appear  in  will  also  appear  in  only  if  the  conjunction  itself  is 
probable  enough  relative  to  Thus  will  not  be  deductively  closed, 

though  we  can  prove  that  if  a  statement  ^  appears  in  Kp,  and  S  entails  T,  T 
will  also  appear  there.  This  reflects  a  natural  feature  of  human  inference; 
we  must  have  reason,  not  only  to  accept  each  premise  in  a  complex  argument, 
but  to  accept  the  con  junc t ion  of  the  premises,  in  order  Co  be  confident  of 


the  conclusion. 
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It  ii  relative  Co  the  practical  corpui,  that  we  make  our 
(practical)  deciaiona.  It  ia  thua  the  (convex  aeta  of)  diatributiona  — 
including  conditional  diatributiona  —  embodied  in  that  aet  of  atatemenCs 
that  we  uae  in  our  deciaion  theory. 

But  there  are  questiona.  What  ia  the  value  of  £  Chat  we  are  taking  aa 
practical  certainty?  How  do  statements  get  in  What  ia  the  decision 

theory  that  goes  with  this  kind  of  structure? 

Let  us  first  consider  the  value  of  £.  Suppose  the  widest  range  of 
stakes  we  can  come  up  with  is  99:1.  For  example,  Sam  and  Sally  are  going  to 
bet  on  some  event,  each  has  $100,  and  neither  has  any  change.  Then  a 
probability  value  falling  outside  the  range  of  ^.01, .99^  would  be  useless  as 
a  betting  guide.  A  probability  less  than  .01  would  (in  this  context)  amount 
to  a  practical  impossibility;  one  greater  than  .99  would  amount  to  a 
practical  certainty. 

The  range  of  stakes  can  determine  the  level  of  "practical  certainty"  £. 
What  counts  as  practical  certainty  depends  on  context,  but  in  an  explicit 
way:  it  depends  on  what's  at  stake.  This  idea  is  developed  in  (Kyburg, 


forthcoming . 2 ) . 

How  do  statements  qualify  as  eviden' e  in  l^?  Not  by  being  "certain." 
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It  c«n  be  argued  that  anything  that  waa  really  incorrigible  vould  have  to  be 
devoid  of  empirical  content.^®  (The  worry  about  uncertain  evidence  ia  not 
misplaced;  it'a  juat  miaconatrued.)  One  typical  form  of  evidence  statement 
ie  this:  'The  length  of  x  is  d  +  r  meters."  Whatever  our  readings,  these 
statements  are  not  "certain"  —  they  admit  of  error.  The  same  is  true  of 
all  ordinary  observation  statements. 

So  a  statement  gets  into  ^  by  having  a  low  probability  of  being  in 
error;  equally,  by  having  a  high  probability  (at  least  je)  of  being 
veridical.  How  high?  In  virtue  of  the  fact  that  conjunctions  of  pairs  of 
statements  in  1^  appear  in  it  seems  plausible  to  take  e  *  (2)^*^^.  For  a 

number  of  technical  reasons  (Kyburg,  1984)  it  turns  out  to  be  best  to 
construe  the  corpus  containing  the  theory  of  error  as  metalinguistic.  This 
is  as  one  might  think:  after  all,  the  theory  of  error  concerns  the  relation 
between  readings  —  e.g.  numerals  written  in  laboratory  books  —  and  values: 
the  real  quantities  characterizing  things  in  the  real  world.  For  present 
purposes  we  need  note  only  that  this  is  not  the  begining  of  an  infinite 
regress.  We  can  maintain  objectivity;  we  can  avoid  "presuppositions"  and 
other  unjustified  assumptions. 

4.  Decision. 

It  han  been  objected  (Seidenfeld,  1979)  that  there  is  no  decision 
theory  that  is  tailored  to  Shafer's  theo'-y  o'  evidential  support.  Indeed, 
it  is  pretty  clear  that  support  functions  alone  would  conflict  with  expected 
utility.  On  the  other  hand,  the  reduction  to  convex  sets  of  distributions 
does  show  that  we  can  have  very  nearly  a  normal  decision  theory  using 
Shafer's  system.  In  computing  the  value  of  an  act,  we  need  to  consider  not 
only  the  support  assigned  to  various  states  of  affairs  (corresponding  to 


lower  probabilities),  but  also  the  plausibilities  —  corresponding  to  upper 


prob»biliti«i. ) 


Thif  it  true  for  Che  Bore  gecerel  convex  set  reprceentetion:  We  cen 
conetruct  *n  intervel  of  expected  utility  for  eech  ect.  A  neturel 
reinterpretetion  of  the  principle  of  doainence  would  take  an  alternative  al 
to  dominate  an  alternative  a2  whenever,  for  every  poaaible  frequency 
distribution,  the  expectation  of  al  is  greater  than  the  expectation  of  a2. 

This  eliminates  some  alternatives,  but  in  general  there  will  be  a 

number  of  courses  of  action  that  are  not  eliminated.  What  we  do  here  is 

another  matter,  one  which  is  certainly  worthy  of  further  study. But  it 

seems  natural  that  minimax  and  mmimax  regret  strategies  are  appropriat ' 

candidates  for  consideration  under  some  conditions.  There  may  well  be 

others,  such  a  satisficing.  And  it  may  even  be  that  the  guidance  provided 

by  the  motto:  eliminate  dominated  alternatives,  is  as  far  as  rationality 

alone  takes  us.  Further  pruning  may  depend  on  constraints  that  are  local  to 
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the  individual  decision  problems. 

5.  The  Structure  of  Knowledge . 

Were  we  to  deal  explicitly  with  our  theory  of  error  and  its  source,  we 
would  have  a  complex  structure  consisting  of  four  sets  of  sentences  in  two 
distinct  languages. But  for  ordinary  decision  theoretic  purposes  there 
are  just  two  sets  of  statements  with  which  we  need  to  be  concerned  ^  and 
Ke.  Evidence  enters  ^  when  it  is  dependable  enough,  and  ^  in  turn 
determines  the  practical  certainties  of  Kp.  This  renders  the  process  of 
uncertain  inference  by  which  any  statement  gets  into  K£  automatically  non¬ 
monotonic.  As  the  contents  of  the  evidential  corpus  ^  changes,  K£  may 
change,  contract,  or  expand.  What  is  practically  certain  at  one  point  may 
cease  to  be  practically  certain  in  the  light  of  new  evidence,  and  in  fact  in 
the  light  of  new  evidence  may  become  evidently  false. 
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Anocher  feature  of  the  relation  between  the  evidential  corpua  and  the 
practical  corpus  is  that  sentences  in  the  evidential  corpua  are  inherited  by 
the  practical  corpus.  The  practical  corpus  is  thus  an  expansion  of  the 
evidential  corpus;  but  it  is  crucial  to  keep  the  two  corpora  distinct.  If  a 
sentence  were  to  be  added  to  the  evidential  corpus  when  it  got  a  high 
probability  relative  to  the  evidential  corpus,  it  could  never  be  eliminated: 
it  would  henceforth  always  have  probability  one  relative  to  that  evidential 
corpus.  The  separation  of  the  practical  and  evidential  corpus  is  required  to 
preserve  the  non-monotonicity  of  uncertain  inference. 

The  decision  maker  need  be  concerned  directly  only  with  the  contents  of 
Kp  —  that  is  what  determines  the  (objective,  frequency-based)  probability 
of  the  alternatives  he  must  choose  between.  But  he  may  be  led  to  worry 
about  the  contents  of  What  is  there  depends  on  the  weight  of  the 

combined  evidence  concerning  it.  This  evidence  is  embodied  in  ^  and  the 
mode  of  combination  flows  from  the  definition  of  probability. 

The  scheme  outlined  does  not  give  us  a  complete  decision  theory  such  as 
we  would  get  from  a  subjective  Bayesian  approach,  but  it  may  take  us  as  far 
as  rationality  can  take  us.  The  role  of  epistemological  probability  in 
decision  theory  is  supported  by  the  theorem  that  for  any  finite  set  of 
sentences  there  is  a  Bayesiau  belief  function  that  fits  the  epistemological 
probability  intervals.  Thus  uncertain  knowledge  and  knowledge  of 
uncertainty  both  find  their  place. 
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*Retearch  for  Chia  paper  waa  aupported  io  part  bj  the  U.S.  Amy  Signala 

Warfare  Laboratory. 

1.  It  ia  thia  Iin«  of  attack  that  liea  behind  the  aubjectiviat  approach  to 
probability  eatablished  independently  by  F.  P.  Ramsey  (1930)  and  Bruno 
de  Finetti  (1937)  and  rendered  respectable  by  L.  J.  Savage  (1954). 

2.  If  '*1^"  is  '*Draw  number  _i  yields  a  purple  ball,"  this  ia  just  to  say 
that  for  i  j  Prob(Pi)  and  Prob(Pi  h  Pj)  do  not  depend  on  the  values 
of  ^  and  J^. 

3.  There  is  a  tradition,  represented  by  H.  Jeffreys  (1939),  R-  Carnap 
(1950),  and  most  recently  E.  T.  Jaynes  (1982),  according  to  which  the 
subjectivity  of  precise  probability  assignments  can  be  eliminated  by 
finding  general  principles  for  assigning  probabilities  to  the 
statements  of  a  given  language.  But  as  Seidenfeld  (forthcoming)  has 
shown,  there  are  serious  difficulties  with  the  Maximum  Entropy  program 
even  beyond  the  fact  that  this  approach  just  pushes  the  arbitrariness 
into  the  choice  of  a  language  or  classification. 

4.  Of  course  this  is  just  one  opinion  among  many  as  to  what  probability 
"is".  But  I  would  hardly  hold  it  if  I  did  not  think  it  correct. 

5.  •  roofs  may  be  found  in  (Kyburg,  1961),  (Kyburg,  1974)  and  (Kyburg, 

1983). 

6.  Counterillustration  may  be  found  in  (Kyburg,  forthcoming. 3) . 

7.  Simply  as  examples:  (Duda,  Hart,  and  Nilsson,  1976),  (Garvey,  Lowrance, 
and  Fishier  1981),  (Pearl,  1985),  (Lowrence,  1982),  Quinlan,  1982). 

8.  We  assume  £(e)  >  0  for  every  £  4  we  also  assume  that  there  is  a 
support  function  £  matching  P. 

9.  This  is  the  lottery  paradox,  first  appearing  in  (Kyburg,  1961). 

10.  One  normally  believes  one's  own  eyes,  but  one  knows  that  hallucinations 


do  occur.  It  it  hard  to  iatgioe  any  obaervational  atateaenCt  whoae 
veridicality  could  not  be  impugned  by  aoae  imaginable  courae  of 
aubae<}uent  obaervationa.  Perhapa  thia  it  not  true  of  phenoaenological 
reporta:  '*Red  patch  here  now."  But  I  auapcct  these  have  no  useful 

content. 

11.  See  Levi  (1980)  for  a  highly  developed  form  of  this  approach. 

12.  Or  perhaps  this  whole  approach  is  wrong-h-raded.  For  the  development  of 
an  alternative,  see  (Loui,  forthcoming.!). 

13.  Viz.:  the  practical  corpus  1^,  the  evidential  corpus  1^,  the  evidential 
metacorpus  MKe,  and  the  i  priori  metacorpus  MXa  containing 
observational  records  and  linguistic  conventions. 

14.  Note  that  in  a  strict  sense,  ^  need  not  even  be  consistent  —  that  is, 
its  deductive  closure  may  be  inconsistent  in  the  ordinary  sense.  This 
is  illustrated  by  the  lottery  alluded  to. 
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